View Post [edit]
Poster: | Richard BBC Archives | Date: | May 11, 2004 9:01pm |
Forum: | petabox | Subject: | Selective powering of large petasites |
The BBC has 85km of shelves, which translates very roughy (digitised at 25 Mb/s) to 200 TB/km => 17 PB. This is an overestimate for us, because not all our shelves hold video, and we have spare copies and VHS 'browse' copies. But it gives a round number: 10 PB for the BBC archive, and similar sizes for other major European broadcast archives.
Our access to this material is selective: about 20% is accessed per year. Is it reasonable to have a 10 PB mass storage system with the majority (like 99% or more) of the drives switched off, applying power only when material on that drive is needed?
Archivists (without the technical knowledge of you lot) are already asking whether we should put hard drives on our shelves, simply because of their low cost.
Is there any basic flaw in designing an archive storage system with "selective" power, to vastly reduce power/cooling requirements?
Reply [edit]
Poster: | illtud_llgc | Date: | May 11, 2004 10:03pm |
Forum: | petabox | Subject: | Re: Selective powering of large petasites |
I would suggest that your needs would best be served by a tape-library, or probably a HSM solution where a portion of the tape-library's content is cached on disk. Presumably a few minutes' wait would not be a problem whilst accessing full-quality bitstreams. Tape libraries can also give you automated tape duplication for offsite storage (disaster recovery) and media refreshing (digital preservation). They also give a lot less problems with regard to power and climate issues (lots of disks equal lots of heat).
Here at the National Library of Wales we've only a smallish (tens of terabytes) tape library, but if cost is not an option, Hitachi or Sony will gladly sell you larger solutions. Your main headache will be the development of the management and cataloging side.
Reply [edit]
Poster: | brewster | Date: | May 11, 2004 11:12pm |
Forum: | petabox | Subject: | Re: Selective powering of large petasites |
Thank you for the notes. We have had some experience with both tapes and hard drives at the Internet Archive and Television Archive, all of which points to the solution of keeping multiple copies and as active as possible.
At the Television Archive, which has holdings closing in on a petabyte, it started on tape and is now recording on hard drives that are kept offline. We dont have much experience on this collection on reading it back except reading back Sept-11-2001 to Sept 18, and it all worked fine.
A bigger tape experiment was trying to read 1000 DLT tapes recorded by the Internet Archive from 1996-1999 and had faults that made some tapes difficult to read and some limited data was lost. It was also very slow to read (took months of an administrators time). Since then, all data is recorded onto hard drives that are kept online.
Disks spinning seem to have a failure rate of 6% per year, but we are working on better measurements. When a disk "fails" it does not always lose data, or sometimes only one block, so recovery can be effective. But this means we should not keep one copy.
Our data protection system is to have at least 2 copies and preferably in distant locations (we have found that human error accounts for real loss as well, so having different administrative bodies helps). We keep copies in San Francisco and at the Library of Alexandria in Egypt.
We are developing the petabox for exactly this reason. It is bottom up designed for reliability, low power, and low cost. The low cost means that we can have 2 or more copies of even large datasets.
I am in Europe for the next 2 months setting up a European Internet Archive that will host those machines in Amsterdam. I would be very interested in talking with anyone about what we are doing if this is of interest.
I can be reached directly at brewster (at) archive.org
-brewster
Digital Librarian
Reply [edit]
Poster: | JTW | Date: | May 12, 2004 4:42pm |
Forum: | petabox | Subject: | Re: Selective powering of large petasites |
I’ve been looking into is having a system which has “on demand wake up” functionality for these reports. The “computers” and more importantly Hard Drives spend most of their time turned off in a sleep mode i.e. actually off and using a Network signal to the BIOS to bring them back to life when required. For large archives this could save thousands in power consumption, heat problems and should cut down hard drive failure rates. From a management side of things, if it’s possible to figure out in advance what is going to be accesses least, place them in the this long-term storage computer system, while keeping the more highly demanded data in always on subsystem.
From a topology point of view everything seems to be online 24/7 but in actuality it’s the requests for data that drive what systems are currently powered up. I’m on the prowl to see if anybody else is doing this before I invest time into creating our own solution for feasibility testing with off the shelf components and Linux. Initial with 4 boxes single 100GB drives (400GB total of data) and a master control. This controller will mount all the subsystem with NFS or SAMBA depend on the OS that worked best for hibernation / suspend modes. The idea being it’s only when someone access data in those subdirectories on the master controller that the other computer will power-up. Of course there needs to be some controlling program that knows the location of all the data you have, meaning you can’t just let the user start browsing the network looking for files as the systems will end up starting and stop ever couple of minutes.
Reply [edit]
Poster: | brewster | Date: | May 12, 2004 5:32pm |
Forum: | petabox | Subject: | Re: Selective powering of large petasites |
We have not done a large-scale test of this approach, but it sounds promising for many applications.
The petabox with spun-down disks would save 1/2 the power.
-brewster
Reply [edit]
Poster: | Jp7733 | Date: | Feb 5, 2022 11:50am |
Forum: | petabox | Subject: | Re: Selective powering of large petasites |
Reply [edit]
Poster: | Jp7733 | Date: | Feb 5, 2022 11:54am |
Forum: | petabox | Subject: | Re: Selective powering of large petasites |
Reply [edit]
Poster: | Rob TNA | Date: | May 12, 2004 11:07pm |
Forum: | petabox | Subject: | Re: Selective powering of large petasites |
From the sound of things, you might also want to take a look at the new MAID (Multiple Array of Mostly Idle Disks) devices appearing on the market. The basic idea seems to be that you have a cabinet of 900 250GB SATA disks, where only 25% of the drives are powered on and spinning at any one time.
Reply [edit]
Poster: | Richard BBC Archives | Date: | May 12, 2004 11:34pm |
Forum: | petabox | Subject: | Re: Selective powering of large petasites |
Thanks for telling me about MAID. Another reader contacted me offline as well -- so it isn't a daft idea and at least one company COPAN is promoting it commercially:
http://www-conf.slac.stanford.edu/dmw2004/slacworkshop/talks/guha/DMF2000-CopanSystems.pdf
Brewster I'll email directly about your European trip -- thank you very much for the offer.
-Richard BBC